SNP detection for massively parallel whole-genome resequencing.

نویسندگان

  • Ruiqiang Li
  • Yingrui Li
  • Xiaodong Fang
  • Huanming Yang
  • Jian Wang
  • Karsten Kristiansen
  • Jun Wang
چکیده

Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNP discovery in apple cultivars using next generation sequencing

Background Knowledge about single nucleotide polymorphism (SNP) markers is extremely important in the development of genotyping assays, allowing improvements in plant breeding through marker-assisted selection. With the emergence of next generation sequencing platforms, high-density SNP discovery in the genome of plant crops becomes more achievable. In this project, we carried out whole genome ...

متن کامل

Does massively parallel DNA resequencing signify the end of histopathology as we know it?

Next-generation DNA sequencing devices have revolutionized cancer genomics by bringing whole genome resequencing of patients' tumours within practical and economic reach. We present an overview of the techniques involved and review early results from the resequencing of cancer genomes. The possible impacts of whole-genome and trancriptome resequencing in clinical cancer research and the practic...

متن کامل

Target Amplicon Sequencing for Genotyping Genome-Wide Single Nucleotide Polymorphisms Identified by Whole-Genome Resequencing in Peanut.

Genome-wide genotyping data regarding breeding materials are essential resources for improving breeding efficiency, especially in plants with complex genomes with a high degree of polyploidy. Several current breeding efforts in cultivated peanut ( L.), which has a tetraploid genome, are devoted to developing high oleic acid cultivars. Genetic maps for such breeding programs have been developed ...

متن کامل

Genome-Wide SNP Calling Using Next Generation Sequencing Data in Tomato

The tomato (Solanum lycopersicum L.) is a model plant for genome research in Solanaceae, as well as for studying crop breeding. Genome-wide single nucleotide polymorphisms (SNPs) are a valuable resource in genetic research and breeding. However, to do discovery of genome-wide SNPs, most methods require expensive high-depth sequencing. Here, we describe a method for SNP calling using a modified ...

متن کامل

NpgRJ_Nmeth_1179 183..188

Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 19 6  شماره 

صفحات  -

تاریخ انتشار 2009